- Polynomial regression
- Step functions
- Regression splines
- Smoothing splines
- Generalized additive models
10/29/2019
\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]
\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]
\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]
\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]
\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]
\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]
R
, you just need to change bs
to ns
!One strategy is to decide \(K\), the number of internal knots, and then place them at appropriate quantiles of the observed \(X\)
A default choice is to add knots at the boundaries (total knots = \(K+2\))
Given \(K\) internal knots, there are \(K+1\) subintervals and \(d + K + 1\) degrees of freedom (\(\beta_0, \beta_1, \dots, \beta_{d+K}\))
A cubic spline with \(K\) internal knots has \(K + 4\) parameters or degrees of freedom
A natural spline with \(K\) internal knots has \(K\) degree of freedom
Fix a degree \(d\), and use cross-validation to choose the number of knots!
Splines can also be used when the response variable is qualitative. For example, consider the logistic regression model
\[\log \left( \frac{p}{1-p} \right) = f(x) = \sum_{k=0}^{K + d} \beta_k b_k(x)\]
Once the basis functions have been defined, we just need to estimate coefficients \(\beta_k\) using a standard logistic regression procedure.
A smooth estimate of the conditional probability \(P(Y = 1 \mid x)\) can then be used for classification.
Consider this criterion for fitting a smooth function \(g(x)\) to some data:
\[\text{argmin}_{g \in \mathbb{S}} \left\{ \sum_{i=1}^n (y_i - g(x_i))^2 + \lambda \int g^{\prime \prime} (t)^2 dt \right\}\]
R
: smooth.spline(X, Y, df = 10)
Allows for flexible nonlinearities in several variables, but retains the additive structure of linear methods: we calculate a separate \(f_j\) for each \(X_j\), and then add together all of their contributions.
\[y_i = \beta_0 + f_1(x_{i1}) + f_2(x_{i2}) + \dots + f_p(x_{ip}) + \varepsilon_i\]
gam(mpg ~ ns(horsepower, df = 5)+ns(acceleration, df = 5)+year)
gam(mpg ~ ns(horsepower, df = 5) : ns(acceleration, df = 5))
Holding acceleration and manufacturing year fixed, fuel efficiency tends to decrease with horsepower
\[\log\left( \frac{p}{1-p} \right) = \beta_0 + f_1(x_{i1}) + f_2(x_{i2}) + \dots + f_p(x_{ip})\]